1. Fundamental Problem of Causal Inference
- causal hypotheses
- independent/dependent variables
2. Correlation
- scatterplots
- problems with correlation
March 15, 2022
In February 2019, Donald Trump held a rally in El Paso, TX. Argued that migrants were dangerous.
In August 2019, an armed man killed 22 people at a Walmart in El Paso, TX. In advance of his attack, he issued a manifesto that stated he was motivated in response to an alleged “Hispanic invasion of Texas.”
A causal claim:
“Trump’s rally in El Paso increased the likelihood of hate crimes against immigrants.”
Causal claim implies…
Counterfactual claim:
“If Trump had not held a rally in El Paso (in 2019), then there would have been fewer hate crimes against immigrants.”
Potential Outcomes:
\(\mathrm{Hate \ Crimes}_{El \ Paso}(\mathrm{Rally}) >\) \(\color{red}{\mathrm{Hate \ Crimes}_{El \ Paso}(\mathrm{No \ Rally})}\)
\(\mathrm{Black}\) is factual; \(\color{red}{\mathrm{Red}}\) is counterfactual
We make causal claims testable by translating them into statements about potential outcomes / the relationship between independent (cause) and dependent (outcome) variables.
For instance:
Independent variable:
The variable capturing the alleged cause in a causal claim.
Dependent variable:
The variable capturing the alleged outcome (what is affected) in a causal claim.
Potential Outcomes are the values of dependent variable a case would take if exposed to different values of the independent variable
“Trump rallies in a community increase the likelihood of hate crimes against immigrants.”
What could be an independent variable used to test this causal claim?
What could be a dependent variable used to test this causal claim?
| \(\mathrm{City}_i\) | \(\mathrm{Rally}_i\) | \(\mathrm{Hate \ Crimes}_i(\mathrm{Rally})\) | \(\mathrm{Hate \ Crimes}_i(\mathrm{No \ Rally})\) |
|---|---|---|---|
| El Paso | Yes | > 1 | ? |
How would we find the “\(?\)”?
If Trump’s rally caused hate crimes to increase in El Paso, we would expect to see this:
\(\mathrm{Hate \ Crimes}_{El \ Paso}(\mathrm{Rally}) >\) \(\color{red}{\mathrm{Hate \ Crimes}_{El \ Paso}(\mathrm{No \ Rally})}\)
While the value in \(\mathrm{Black}\) is factual; The value in \(\color{red}{\mathrm{Red}}\) is counterfactual and can never be known.
What behaviors cause a person to become wealthy?
Can we learn anything from this evidence?
We cannot say anything about causality if:
We cannot observe: \(\color{red}{\mathrm{Hate \ Crimes}_{El \ Paso}(\mathrm{No \ Rally})}\)
But we can observe, e.g.: \(\mathrm{Hate \ Crimes}_{Austin}(\mathrm{No \ Rally})\)
If we assume: \(\mathrm{Hate \ Crimes}_{Austin}(\mathrm{No \ Rally})\) \(=\) \(\color{red}{\mathrm{Hate \ Crimes}_{El \ Paso}(\mathrm{No \ Rally})}\)
Then, we can test our causal claim, to see if:
\(\mathrm{Hate \ Crimes}_{El \ Paso}(\mathrm{Rally}) >\) \(\mathrm{Hate \ Crimes}_{Austin}(\mathrm{No \ Rally})\)
| \(\mathrm{City}_i\) | \(\mathrm{Rally}_i\) | \(\mathrm{Hate \ Crimes}_i(\mathrm{Rally})\) | \(\mathrm{Hate \ Crimes}_i(\mathrm{No \ Rally})\) |
|---|---|---|---|
| El Paso | Yes | \(\mathrm{Hate \ Crimes}_{El \ Paso}(\mathrm{Rally})\) | \(\color{red}{\mathrm{Hate \ Crimes}_{El \ Paso}(\mathrm{No \ Rally})}\) |
| \(\mathbf{\Uparrow}\) | |||
| Austin | No | \(\color{red}{\mathrm{Hate \ Crimes}_{Austin}(\mathrm{Rally})}\) | \(\boxed{\mathrm{Hate \ Crimes}_{Austin}(\mathrm{No \ Rally})}\) |
| \(\mathrm{City}_i\) | \(\mathrm{Rally}_i\) | \(\mathrm{Hate \ Crimes}_i(\mathrm{Rally})\) | \(\mathrm{Hate \ Crimes}_i(\mathrm{No \ Rally})\) |
|---|---|---|---|
| El Paso | Yes | \(\mathrm{Hate \ Crimes}_{El \ Paso}(\mathrm{Rally})\) | \(\boxed{\mathrm{Hate \ Crimes}_{Austin}(\mathrm{No \ Rally})}\) |
| \(\mathbf{\Uparrow}\) | |||
| Austin | No | \(\color{red}{\mathrm{Hate \ Crimes}_{Austin}(\mathrm{Rally})}\) | \(\mathrm{Hate \ Crimes}_{Austin}(\mathrm{No \ Rally})\) |
Every solution to the FPCI involves:
Comparing the observed values of outcome \(Y\) in cases that actually have different values of cause \(X\)
Making assumptions that let us treat factual (observed) potential outcomes from some cases as equivalent to counterfactual (unobserved) potential outcomes of other cases.
Correlation is the degree of association/relationship between the observed values of \(X\) (the independent variable) and \(Y\) (the dependent variable)
All empirical evidence for causal claims relies on correlation between the independent and dependent variables.
But, you’ve all heard this:

How do we turn correlation into evidence of causation?
Many different ways of assessing correlation.
data from this paper
mathematically: correlation is the degree of linear association between \(X\) and \(Y\)
negative correlation: (\(< 0\)) values of \(X\) and \(Y\) move in opposite direction:
positive correlation: (\(> 0\)) values of \(X\) and \(Y\) move in same direction:
It is possible to see perfect correlation but small change in \(Y\) across \(X\)
It is possible to see weak correlation but large change in \(Y\) across \(X\)
It is possible to see perfect nonlinear relationship between \(X\) and \(Y\) with \(0\) correlation
weak correlation: values for \(X\) and \(Y\) do not cluster along line
strong correlation: values for \(X\) and \(Y\) cluster strongly along a line
strength of correlation does not determine the slope of line describing \(X,Y\) relationship
effect size: this is the slope of the line describing the \(X,Y\) relationship. The larger the effect, the steeper the slope
Does this correlation prove that Trump rallies caused hate crimes? Why or why not?
Does this correlation prove that Nick Cage caused drownings? Why or why not?
random association: correlations between \(X\) and \(Y\) occur by chance and do not reflect
bias (spurious correlation, confounding): \(X\) and \(Y\) are correlated but the correlation does not result from causal relationship between those variables
Solving these problems involves making assumptions: what are those assumptions? how plausible are they?
Arbitrary processes can make seemingly-strong patterns.
If you look long enough at pure chaos, you might find a strong correlation
To see that random patterns can emerge, I use random number generators to
We can imagine these are the observed \(X\) and \(Y\) for \(5\) cases.
How easy is it to find a strong correlation?
\(\#\) Tries to get correlation \(> 0.9\): 54
What do we do about this problem?
Field of statistics investigates properties of chance events:
This procedure works…
we know the chance processes that could generate this correlation
Tries to get correlation \(> 0.9\): 252
Tries to get correlation \(> 0.9\): 623553
Tries to get correlation \(> 0.45\): 22
statistical significance:
An indication of how likely it is that correlation we observe could have happened purely by chance.
higher degree of statistical significance indicates correlation is unlikely to have happened by chance
\(p\) value:
A numerical measure of statistical significance. Puts a number on how likely observed correlation would have occurred by chance, assuming a we know the chance procedure and the truth is a \(0\) correlation.
It is a probability, so is between \(0\) and \(1\).
Lower \(p\)-values indicate greater statistical significance
\(p < 0.05\) often used as threshold for “significant” result.
\(p\) value:
Be wary of “\(p\)-hacking”
| Statistical Significance |
\(p\)-value | By Chance? | Why? | “Real”? |
|---|---|---|---|---|
| Low | High (\(p > 0.05\)) | Likely | small \(N\) weak correlation |
Probably not |
| High | Low (\(p < 0.05\)) | Unlikely | large \(N\) strong correlation |
Probably |